Goto

Collaborating Authors

 jetson orin nano


Computer Vision for Real-Time Monkeypox Diagnosis on Embedded Systems

Delgado-López, Jacob M., Morell-Rodriguez, Ricardo A., Rosario, Sebastián O. Espinosa-Del, Lugo-Beauchamp, Wilfredo E.

arXiv.org Artificial Intelligence

The rapid diagnosis of infectious diseases, such as monkeypox, is crucial for effective containment and treatment, particularly in resource-constrained environments. This study presents an AI-driven diagnostic tool developed for deployment on the NVIDIA Jetson Orin Nano, leveraging the pre-trained MobileNetV2 architecture for binary classification. The model was trained on the open-source Monkeypox Skin Lesion Dataset, achieving a 93.07% F1-Score, which reflects a well-balanced performance in precision and recall. To optimize the model, the TensorRT framework was used to accelerate inference for FP32 and to perform post-training quantization for FP16 and INT8 formats. TensorRT's mixed-precision capabilities enabled these optimizations, which reduced the model size, increased inference speed, and lowered power consumption by approximately a factor of two, all while maintaining the original accuracy. Power consumption analysis confirmed that the optimized models used significantly less energy during inference, reinforcing their suitability for deployment in resource-constrained environments. The system was deployed with a Wi-Fi Access Point (AP) hotspot and a web-based interface, enabling users to upload and analyze images directly through connected devices such as mobile phones. This setup ensures simple access and seamless connectivity, making the tool practical for real-world applications. These advancements position the diagnostic tool as an efficient, scalable, and energy-conscious solution to address diagnosis challenges in underserved regions, paving the way for broader adoption in low-resource healthcare settings.


VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking

Yang, Kichang, Kim, Seonjun, Kim, Minjae, Zhang, Nairan, Zhang, Chi, Lee, Youngki

arXiv.org Artificial Intelligence

Edge deployment of large Vision-Language Models (VLMs) increasingly relies on flash-based weight offloading, where activation sparsification is used to reduce I/O overhead. However, conventional sparsification remains model-centric, selecting neurons solely by activation magnitude and neglecting how access patterns influence flash performance. We present Neuron Chunking, an I/O-efficient sparsification strategy that operates on chunks (i.e., groups of contiguous neurons in memory) and couples neuron importance with storage access cost. The method models I/O latency through a lightweight abstraction of access contiguity and selects chunks with high utility, defined as neuron importance normalized by estimated latency. By aligning sparsification decisions with the underlying storage behavior, Neuron Chunking improves I/O efficiency by up to 4.65x and 5.76x on Jetson Orin Nano and Jetson AGX Orin, respectively.


Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges

Islam, Md Romyull, Deng, Bobin, Dhar, Nobel, Nguyen, Tu N., He, Selena, Shi, Yong, Suo, Kun

arXiv.org Artificial Intelligence

Cloud-based large language models (LLMs) and their variants have significantly influenced real-world applications. Deploying smaller models (i.e., small language models (SLMs)) on edge devices offers additional advantages, such as reduced latency and independence from network connectivity. However, edge devices' limited computing resources and constrained energy budgets challenge efficient deployment. This study evaluates the power efficiency of five representative SLMs - Llama 3.2, Phi-3 Mini, TinyLlama, and Gemma 2 on Raspberry Pi 5, Jetson Nano, and Jetson Orin Nano (CPU and GPU configurations). Results show that Jetson Orin Nano with GPU acceleration achieves the highest energy-to-performance ratio, significantly outperforming CPU-based setups. Llama 3.2 provides the best balance of accuracy and power efficiency, while TinyLlama is well-suited for low-power environments at the cost of reduced accuracy. In contrast, Phi-3 Mini consumes the most energy despite its high accuracy. In addition, GPU acceleration, memory bandwidth, and model architecture are key in optimizing inference energy efficiency. Our empirical analysis offers practical insights for AI, smart systems, and mobile ad-hoc platforms to leverage tradeoffs from accuracy, inference latency, and power efficiency in energy-constrained environments.


6D Strawberry Pose Estimation: Real-time and Edge AI Solutions Using Purely Synthetic Training Data

Sinha, Saptarshi Neil, Kühn, Julius, Goschke, Mika Silvan, Weinmann, Michael

arXiv.org Artificial Intelligence

Automated and selective harvesting of fruits has become an important area of research, particularly due to challenges such as high costs and a shortage of seasonal labor in advanced economies. This paper focuses on 6D pose estimation of strawberries using purely synthetic data generated through a procedural pipeline for photorealistic rendering. W e employ the YOLOX-6D-Pose algorithm, a single-shot approach that leverages the YOLOX backbone, known for its balance between speed and accuracy, and its support for edge inference. T o address the lacking availability of training data, we introduce a robust and flexible pipeline for generating synthetic strawberry data from various 3D models via a procedural Blender pipeline, where we focus on enhancing the realism of the synthesized data in comparison to previous work to make it a valuable resource for training pose estimation algorithms. Quantitative evaluations indicate that our models achieve comparable accuracy on both the NVIDIA RTX 3090 and Jetson Orin Nano across several ADD-S metrics, with the RTX 3090 demonstrating superior processing speed. However, the Jetson Orin Nano is particularly suited for resource-constrained environments, making it an excellent choice for deployment in agricultural robotics. Qualitative assessments further confirm the model's performance, demonstrating its capability to accurately infer the poses of ripe and partially ripe strawberries, while facing challenges in detecting unripe specimens. This suggests opportunities for future improvements, especially in enhancing detection capabilities for unripe strawberries (if desired) by exploring variations in color . Furthermore, the methodology presented could be adapted easily for other fruits such as apples, peaches, and plums, thereby expanding its applicability and impact in the field of agricultural automation.


EdgeNavMamba: Mamba Optimized Object Detection for Energy Efficient Edge Devices

Aalishah, Romina, Navardi, Mozhgan, Mohsenin, Tinoosh

arXiv.org Artificial Intelligence

Deployment of efficient and accurate Deep Learning models has long been a challenge in autonomous navigation, particularly for real-time applications on resource-constrained edge devices. Edge devices are limited in computing power and memory, making model efficiency and compression essential. In this work, we propose EdgeNavMamba, a reinforcement learning-based framework for goal-directed navigation using an efficient Mamba object detection model. To train and evaluate the detector, we introduce a custom shape detection dataset collected in diverse indoor settings, reflecting visual cues common in real-world navigation. The object detector serves as a pre-processing module, extracting bounding boxes (BBOX) from visual input, which are then passed to an RL policy to control goal-oriented navigation. Experimental results show that the student model achieved a reduction of 67% in size, and up to 73% in energy per inference on edge devices of NVIDIA Jetson Orin Nano and Raspberry Pi 5, while keeping the same performance as the teacher model. EdgeNavMamba also maintains high detection accuracy in MiniWorld and IsaacLab simulators while reducing parameters by 31% compared to the baseline. In the MiniWorld simulator, the navigation policy achieves over 90% success across environments of varying complexity.


Joint Memory Frequency and Computing Frequency Scaling for Energy-efficient DNN Inference

Han, Yunchu, Nan, Zhaojun, Zhou, Sheng, Niu, Zhisheng

arXiv.org Artificial Intelligence

Deep neural networks (DNNs) have been widely applied in diverse applications, but the problems of high latency and energy overhead are inevitable on resource-constrained devices. To address this challenge, most researchers focus on the dynamic voltage and frequency scaling (DVFS) technique to balance the latency and energy consumption by changing the computing frequency of processors. However, the adjustment of memory frequency is usually ignored and not fully utilized to achieve efficient DNN inference, which also plays a significant role in the inference time and energy consumption. In this paper, we first investigate the impact of joint memory frequency and computing frequency scaling on the inference time and energy consumption with a model-based and data-driven method. Then by combining with the fitting parameters of different DNN models, we give a preliminary analysis for the proposed model to see the effects of adjusting memory frequency and computing frequency simultaneously. Finally, simulation results in local inference and cooperative inference cases further validate the effectiveness of jointly scaling the memory frequency and computing frequency to reduce the energy consumption of devices.


Enabling On-Device Medical AI Assistants via Input-Driven Saliency Adaptation

Kallakurik, Uttej, Humes, Edward, Jonna, Rithvik, Lin, Xiaomin, Mohsenin, Tinoosh

arXiv.org Artificial Intelligence

--Large Language Models (LLMs) have significant impact on the healthcare scenarios but remain prohibitively large for deployment in real-time, resource-constrained environments such as edge devices. In this work, we introduce a novel medical assistant system, optimized through our general-purpose compression framework, which tailors Large Language Models (LLMs) for deployment in specialized domains. By measuring neuron saliency on domain-specific data, our method can aggressively prune irrelevant neurons, reducing model size while preserving performance. Following pruning, we apply post-training quantization to further reduce the memory footprint, and evaluate the compressed model across medical benchmarks including MedMCQA, MedQA, and PubMedQA. We also deploy the 50% compressed Gemma and the 67% compressed LLaMA3 models on Jetson Orin Nano (18.7W peak) and Raspberry Pi 5 (6.3W peak), achieving real-time, energy-efficient inference under hardware constraints.


Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI

Chakravarty, Aditya

arXiv.org Artificial Intelligence

Diffusion models have shown remarkable flexibility for solving inverse problems without task-specific retraining. However, existing approaches such as Manifold Preserving Guided Diffusion (MPGD) apply only a single gradient update per denoising step, limiting restoration fidelity and robustness, especially in embedded or out-of-distribution settings. In this work, we introduce a multistep optimization strategy within each denoising timestep, significantly enhancing image quality, perceptual accuracy, and generalization. Our experiments on super-resolution and Gaussian deblurring demonstrate that increasing the number of gradient updates per step improves LPIPS and PSNR with minimal latency overhead. Notably, we validate this approach on a Jetson Orin Nano using degraded ImageNet and a UAV dataset, showing that MPGD, originally trained on face datasets, generalizes effectively to natural and aerial scenes. Our findings highlight MPGD's potential as a lightweight, plug-and-play restoration module for real-time visual perception in embodied AI agents such as drones and mobile robots.


An Edge AI Solution for Space Object Detection

Zhang, Wenxuan, Hu, Peng

arXiv.org Artificial Intelligence

Effective Edge AI for space object detection (SOD) tasks that can facilitate real-time collision assessment and avoidance is essential with the increasing space assets in near-Earth orbits. In SOD, low Earth orbit (LEO) satellites must detect other objects with high precision and minimal delay. We explore an Edge AI solution based on deep-learning-based vision sensing for SOD tasks and propose a deep learning model based on Squeeze-and-Excitation (SE) layers, Vision Transformers (ViT), and YOLOv9 framework. We evaluate the performance of these models across various realistic SOD scenarios, demonstrating their ability to detect multiple satellites with high accuracy and very low latency.


NVIDIA's latest compact generative AI supercomputer is also its cheapest

Engadget

NVIDIA has just revealed the Jetson Orin Nano Super Developer Kit, which is the successor to its Jetson Orin Nano kit from 2022. This new compact generative AI supercomputer can fit into the palm of your hand. Included in the developer kit is an 8GB Jetson Orin Nano system-on-module and a reference carrier board. In terms of performance, the Jetson Orin Nano Super can reach 68 trillion operations per second (TOPS), a 70 percent increase from its predecessor. NVIDIA also claims a 1.7 times improvement in generative AI inference performance and a 50 percent bandwidth increase to 102GB per second.